71 research outputs found

    Interdependent Privacy

    Get PDF

    PrivaTree: Collaborative Privacy-Preserving Training of Decision Trees on Biomedical Data

    Get PDF
    Biomedical data generation and collection have become faster and more ubiquitous. Consequently, datasets are increasingly spread across hospitals, research institutions, or other entities. Exploiting such distributed datasets simultaneously can be beneficial; in particular, classification using machine learning models such as decision trees is becoming increasingly common and important. However, given that biomedical data is highly sensitive, sharing data records across entities or centralizing them in one location are often prohibited due to privacy concerns or regulations. We design PrivaTree, an efficient and privacy-preserving protocol for collaborative training of decision tree models on distributed, horizontally partitioned, biomedical datasets. Although decision tree models may not always be as accurate as neural networks, they have better interpretability and are helpful in decision-making processes, which are crucial for biomedical applications. PrivaTree follows a federated learning approach, where raw data is not shared, and where every data provider computes updates to a global decision tree model being trained, on their private dataset. This is followed by privacy-preserving aggregation of these updates using additive secret-sharing, in order to collaboratively update the model. We implement PrivaTree, and evaluate its computational and communication efficiency on three different biomedical datasets, as well as the accuracy of the resulting models. Compared to the model centrally trained on all data records, the obtained collaborative model presents a modest loss of accuracy, while consistently outperforming the accuracy of the local models, trained separately by each data provider. Moreover, PrivaTree is more efficient than existing solutions, which makes it usable for training decision trees with numerous nodes, on large complex datasets, with both continuous and categorical attributes, as often found in the biomedical field

    On the Role and Form of Personal Information Disclosure in Cyberbullying Incidents

    Get PDF
    oai:serval.unil.ch:BIB_9686FB9915EBDisclosing personal information significantly increases the likelihood of cyberbullying incidents, highlighting the significance of investigating the relationships between various stakeholders in cyberbullying incidents. Our objective is to gain insights into the role of different stakeholders, types, and typical paths of personal information in cyberbullying incidents. To achieve this, we conducted a large-scale survey with a representative sample of internet users from the United States and Nigeria (N = 1555). Our findings indicate that cyberbullying is often fueled by personal information which becomes known to other stakeholders directly or through social media. Additionally, cyberbullying incidents involve more than just attackers and victims; they can involve other stakeholders as third-party ‘disclosers’. Both strangers and friends typically engage in such activity. Finally, cyberbullying incidents are twice as common in Nigeria as in the United States. Our findings have implications for design, social media literacy programs, and policy

    An Empirical Study of the Usage of Checksums for Web Downloads

    Get PDF
    Checksums, typically provided on webpages and generated from cryptographic hash functions (e.g., MD5, SHA256) or signature schemes (e.g., PGP), are commonly used on websites to enable users to verify that the files they download have not been tampered with when stored on possibly untrusted servers. In this paper, we shed light on the current practices regarding the usage of checksums for web downloads (hash functions used, visibility and validity of checksums, type of websites and files, presence of instructions, etc.), as this has been mostly overlooked so far. Using a snowball-sampling strategy for the 200,000 most popular domains of the Web, we first crawled a dataset of 8.5M webpages, from which we built, through an active-learning approach, a unique dataset of 277 diverse webpages that contain checksums. Our analysis of these webpages reveals interesting findings about the usage of checksums. For instance, it shows that checksums are used mostly to verify program files, that weak hash functions are frequently used and that a non-negligible proportion of the checksums provided on webpages do not match that of their associated files. We make freely available our dataset and the code for collecting and analyzing it. Finally, we complement our analysis with a survey of the webmasters of the considered webpages (26 complete responses), shedding light on the reasons behind the checksum-related choices they make

    Short paper: Cheat Detection and Prevention in P2P MOGs

    Get PDF
    International audienceIn peer-to-peer games, cheaters can easily disrupt the game state computation and dissemination, perform illegal actions and unduly gain access to sensitive information. We propose AntiCheat - a cheat detection and prevention protocol following a mutual verification approach complemented with information exposure mitigation. It is based on a randomized dynamic proxy scheme for both the dissemination and verification of actions and further reduces the information exposed to players close to the minimum required to render the game. We build a proof-of-concept prototype on top of Quake III. Experimentations with up to 48 players show that opportunities to cheat can be significantly reduced, even in the presence of colluding cheaters, while keeping good performance

    “I thought you were okay”: Participatory Design with Young Adults to Fight Multiparty Privacy Conflicts in Online Social Networks

    Get PDF
    International audienceWhile sharing multimedia content on Online Social Networks (OSNs) has many benefits, exposing other people without obtaining permission could cause Multiparty Privacy Conflicts (MPCs). Earlier studies developed technical solutions and dissuasive approaches to address MPCs. However, none of these studies involved OSN users who have experienced MPCs, in the design process, possibly overlooking the valuable experiences these individuals might have accrued. To fill this gap, we recruited participants specifically from this population of users, and involved them in participatory design sessions aiming at ideating solutions to reduce the incidence of MPCs. To frame the activities of our participants, we borrowed terminology and concepts from a well known framework used in the justice systems. Over the course of several design sessions, our participants designed 10 solutions to mitigate MPCs. The designed solutions leverage different mechanisms, including preventing MPCs from happening, dissuading users from sharing, mending the harm, and educating users about the community standards. We discuss the open design and research opportunities suggested by the designed solutions and we contribute an ideal workflow that synthesizes the best of each solution. This study contributes to the innovation of privacy-enhancing technologies to limit the incidences of MPCs in OSNs

    A Study on the Use of Checksums for Integrity Verification of Web Downloads

    Get PDF
    App stores provide access to millions of different programs that users can download on their computers. Developers can also make their programs available for download on their websites and host the program files either directly on their website or on third-party platforms, such as mirrors. In the latter case, as users download the software without any vetting from the developers, they should take the necessary precautions to ensure that it is authentic. One way to accomplish this is to check that the published file’s integrity verification code – the checksum – matches that (if provided) of the downloaded file. To date, however, there is little evidence to suggest that such process is effective. Even worse, very few usability studies about it exist. In this paper, we provide the first comprehensive study that assesses the usability and effectiveness of the manual checksum verification process. First, by means of an in-situ experiment with 40 participants and eye-tracking technology, we show that the process is cumbersome and error-prone. Second, after a 4-month long in-the-wild experiment with 134 participants, we demonstrate how our proposed solution – a Chrome extension that verifies checksums automatically – significantly reduces human errors, improves coverage, and has only limited impact on usability. It also confirms that, sadly, only a tiny minority of websites that link to executable files in our sample provide checksums (0.01%), which is a strong call to action for web standards bodies, service providers and content creators to increase the use of file integrity verification on their properties

    Are Those Steps Worth Your Privacy? Fitness-Tracker Users' Perceptions of Privacy and Utility

    Get PDF
    Fitness trackers are increasingly popular. The data they collect provides substantial benefits to their users, but it also creates privacy risks. In this work, we investigate how fitness-tracker users perceive the utility of the features they provide and the associated privacy-inference risks. We conduct a longitudinal study composed of a four-month period of fitness-tracker use (N = 227), followed by an online survey (N = 227) and interviews (N = 19). We assess the users’ knowledge of concrete privacy threats that fitness-tracker users are exposed to (as demonstrated by previous work), possible privacy-preserving actions users can take, and perceptions of utility of the features provided by the fitness trackers. We study the potential for data minimization and the users’ mental models of how the fitness tracking ecosystem works. Our findings show that the participants are aware that some types of information might be inferred from the data collected by the fitness trackers. For instance, the participants correctly guessed that sexual activity could be inferred from heart-rate data. However, the participants did not realize that also the non-physiological information could be inferred from the data. Our findings demonstrate a high potential for data minimization, either by processing data locally or by decreasing the temporal granularity of the data sent to the service provider. Furthermore, we identify the participants’ lack of understanding and common misconceptions about how the Fitbit ecosystem works

    Inferring Social Ties in Pervasive Networks: An On-Campus Comparative Study

    Get PDF
    International audienceWiFi base stations are increasingly deployed in both public spaces and private companies, and the increase in their density poses a significant threat to the privacy of users. Prior studies have shown that it is possible to infer the social ties between users from their (co-)location traces but they lack one important component: the comparison of the inference accuracy between an internal attacker (e.g., a curious application running on the device) and a realistic external eavesdropper (e.g., a network of snifing stations) in the same field trial. We experimentally show that such an eavesdropper can infer the type of social ties between mobile users better than an internal attacker

    Content and Geographical Locality in User-Generated Content Sharing Systems

    Get PDF
    International audienceUser Generated Content (UGC), such as YouTube videos, accounts for a substantial fraction of the Internet traffic. To optimize their performance, UGC services usually rely on both proactive and reactive approaches that exploit spatial and temporal locality in access patterns. Alternative types of locality are also relevant and hardly ever considered together. In this paper, we show on a large (more than 650,000 videos) YouTube dataset that content locality (induced by the related videos feature) and geographic locality, are in fact correlated. More specifically, we show how the geographic view distribution of a video can be inferred to a large extent from that of its related videos. We leverage these findings to propose a UGC storage system that proactively places videos close to the expected requests. Compared to a caching-based solution, our system decreases by 16% the number of requests served from a different country than that of the requesting user, and even in this case, the distance between the user and the server is 29% shorter on average
    corecore